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Abstract: Radio Frequency IDentification (RFID) technology promises to revolutionize 
the way we track items and assets, but in RFID systems, missreading is a common 
phenomenon and it poses an enormous challenge to RFID data management, so accurate 
data cleaning becomes an essential task for the successful deployment of systems. In this 
paper, we present the design and development of a RFID data cleaning system, the first 
declarative, behavior-based unreliable RFID data smoothing system. We take advantage of 
kinematic characteristics of tags to assist in RFID data cleaning. In order to establish the 
conversion relationship between RFID data and kinematic parameters of the tags, we 
propose a movement behavior detection model. Moreover, a Reverse Order Filling 
Mechanism is proposed to ensure a more complete access to get the movement behavior 
characteristics of tag. Finally, we validate our solution with a common RFID application 
and demonstrate the advantages of our approach through extensive simulations. 

Keywords: RFID technology; miss reading; data cleaning; movement behavior; kinematic 
characteristics 



1. Introduction 

Radio Frequency Identification (RFID) is an electronic tagging technology that allows objects, 
places, or persons to be automatically identified at a distance without a direct line-of-sight, using an 
electromagnetic challenge/response exchange [1,2]. RFID offers a possible alternative to barcodes, and 
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has emerged as a key technology for a wide-range of applications, including supply chain, retail stores, 
and asset management [3]. However, the widespread adoption of RFID technology is limited for the 
unreliability of the data streams produced by RFID readers [4,5]. RFID data cleaning is therefore 
widely considered as a principal challenge and has been an important research topic in the last few 
years [6-8]. 

Despite the improvement of the accuracy of RFID readers, there are still erroneous readings such as 
missed readings and ghost readings, due to interference, inappropriate placement of tags, temporary or 
permanent malfunction of some components. 

The goal of RFID data cleaning is to eliminate the erroneous readings, especially to reduce or 
eliminate dropped readings. In this paper, we propose an innovative approach of cleaning RFID raw 
data Behavior-Based Smoothing for unreliable RFID data (BBS). Unlike conventional techniques, 
BBS relays primarily on the movement behavior of tags to fill the RFID data. Our biggest obstacle is 
how to obtain movement behavior characteristics of tags. To address this problem, a movement 
behavior detection model is proposed so that we can get the results by analyzing existing uncertain 
data of the corresponding tags. The contributions of this study are as follows: 

A movement behavior detection model. By counting the frequency of tags read in each cycle, 
we can get the read rate of tags and analyze kinematic characteristics of the tags according to 
changes of the read rate sequences, and ultimately to assist in RFID data cleaning. 
Reverse Order Filling Mechanism (ROFM). Based on the detection model, we design and 
implement a reversible RFID data filter. When we detect the data has not been filled 
completely, ROFM will be started to fill the data again in reverse order. The mechanism can 
ensure a more complete access to get the movement behavior characteristics of tags, and thus 
significantly improve the accuracy of data cleaning without scanning all the data twice. 
Improve the positioning accuracy of the RFID reader. Traditional RFID positioning system can 
only provide the Boolean result such as the condition whether the tag is in the read range of the 
reader at the time. But BBS can also get the distance between the tag and the reader, and even 
the velocity of tags. 

Evaluate the effect of BBS. We design several groups of contrast experiments on the data sets 
include measured data and simulation data. The results show that under all conditions with 
different missing rates, obviously, the precision of BBS is better than that of sliding- window 
cleaning. 

The rest of this paper is organized as follows: we discuss the related work in Section 2. Section 3 
defines the Object Movement Detection model and introduces our RFID data cleansing mechanism 
and arithmetic. An empirical evaluation of our solution is reported in Section 4. Finally, Section 5 
concludes the paper. 

2. Related Work 

RFID technology has posed many challenges to database management systems, such as the 
requirements of supporting big volume data [9-11], handing new types of queries [11], event 
processing and data cleaning [5,12-16]. 
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Many systems have been developed to manage uncertainty data. RFID data management, is one 
of the most important applications that drives the recent surge of interest in managing incomplete 
and uncertain data, which has been studied extensively. Valentine et al. [8] presented an adaptive 
sliding-window based approach WSTD for reducing false negative reads in RFID data streams. 
Rao et al. [13] presented a deferred approach for detecting and correcting RFID data anomalies by 
utilizing declarative sequenced-based rules. Chen et al. [14] proposed a Bayesian inference based 
approach, which takes full advantage of data redundancy, for cleaning RFID raw data. Gonzalez et al [15] 
proposed a cleaning framework that takes an RFID data set and a collection of cleaning methods, with 
associated costs, and induces a cleaning plan that optimizes the overall accuracy adjusted cleaning 
costs by determining the conditions under which inexpensive methods are appropriates, and those 
when more expensive methods are absolutely necessary. 

The work in [5,12] is the most relevant research to this paper. Jeffery et al [5,12] proposed an 
adaptive smoothing filter SMURF for RFID data cleaning. SMURF focuses on a sliding-window 
aggregate that interpolates for lost readings. SMURF models the unreliability of RFID readings by 
taking RFID streams as a statistical sample of physical tags, and exploits techniques in sampling 
theory to drive its cleaning processes. But it is mainly applied to the circumstances that the movement 
of tags is infrequent, and is not effective in the case that tags move frequently. 

3. Unreliable RFID Data Cleaning 

3.1. A Movement Behavior Detection Model 

The key for a movement behavior-based smoothing filter lies in how to establish the conversion 
relationship between read rate sequences and kinematic parameters of tags to assist in RFID data 
cleaning. To do so, we proposed a movement behavior detection model. 

The process of tag passing through the reader's read range follows the laws of kinematics. The 
change of kinematic parameters such as displacement and velocity which possess an important feature 
is continuous, not transitional, so if the location (which mainly refers to the distance between tag 
and reader) and the relative velocity of tag at the time can be obtained through the original data, we 
can speculate the parameters of the tag at the missed reading time by these parameters and their trends, 
and further assist in data cleaning and improve its accuracy. BBS uses this approach, for example, 
using existing tag data to analyze and get the location p\ and the velocity v\ of the tag at the time t\ 9 
which can help approximately inferring to the relative location of the tag at the time t\ + T(T refers to 
a short period of time). Finally, by mapping the location information back to the RFID data, we can fill 
the missed RFID data. Therefore, through these kinematic parameters BBS can obtain whether the tag 
is in the detection range at the time, and further give its specific location. 

Adopting the statistical methods similar to SMURF, each epoch is viewed as an independent 
Bernoulli trial with success probability p t [12]. An epoch may be specified as a number of 
interrogation cycles or a unit of time. A typical epoch range is 0.2-0.25 seconds [5]. For each epoch, 
the reader keeps track of all the tags that have been identified, and additional information such as the 
number of interrogation responses for each tag and the last time the tag was read. Assuming, there are 
n interrogation cycles in an epoch, the number that tagi is monitored is m*. We can get the read rate of 
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tagi at the moment by p t = miln. In the process of passing through the reader's read range, tags will be 
continuously scanned. Also in the whole process, the read rate of tag is not constant but constantly 
changing with the distance between the tag and reader. Besides, some researchers have proved by 
experiments that in the reader's detection region there is a linear relationship between read rate p and 
distance s [12]. For specific readers, the detection range S is a constant. To confirm this conclusion, we 
have carried out similar experiments and the conclusion is shown in Figure 1. The quiet condition 
means an ideal working environment of RFID devices with only a few interferences, while the noisy 
condition means a work environment with more interferences. 



Figure 1. Read rate of tags in different conditions, (a) Quiet condition; (b) Noisy condition. 
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By further abstraction of the conclusions above we get the relationship between read rate p and 
distance s in Figure 2. Obviously, the distance s between tag and reader and the read rate p follow the 
relation as: 



p = 



0 ks + b<0 

ks + b 0<ks + b<\ 

1 ks + b > 1 



(1) 



where, b = -kS, and k is the slope of the line, so above equation can be further written as: 

0 s>S 

p = \k(s-S) 0<k(s-S)<l 

1 k(s-S)>l 



(2) 



Figure 2. Relationship between read rate and distance. 
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3.2. Behavior-Based Smoothing for Unreliable RFID Data 

In this section, how to use the model to fill the missed RFID data will be discussed. In our model, 
epoch is the basic unit of RFID data streams. Our mission is to fill in the missed epoch information. 
The information of RFID data stream that we get includes tag ID, the number of interrogation 
responses for each tag in an epoch and the time of the epoch, in the form of {tag ID, Response 
number, time). Let us analyze Equation (2). The read rate p can be calculated through Response 
number, and the detection range S is a constant, but the distance s can't be calculated directly. In 
practice, the detection region of each reader is generally not very large, ranging from a few meters to 
tens of meters. Therefore, the movement through the detection region for persons, vehicles and goods 
on the conveyor belt and other tagged items can be approximately considered as uniform linear motion 
or a combination of several successive uniform linear motions. In addition, even if the velocity and 
direction of the objects has obviously changed in this process, we can also break down their 
movement, and approximately consider each short process as uniform linear motion. Well known, the 
speed v of uniform linear motion satisfies the equation As = vAt. And if we consider so is the original 
distance of the tag, and Equation (2) can be further written as: 

0 KAt + B<0 

p = <KAt + B 0<KAt + B<\ (3) 

1 KAt + B>\ 

where K = ± kv (It take the negative sign when the value of p increases, otherwise take the positive sign), 
and B = k(so~S). 

In practice, readers are usually interfered by the surroundings including the signal reflection and 
obstruction or sudden current gain, etc., so the read rate that is calculated by Response number will be 
unstable. The results from directly treating the raw data may differ from the actual movement 
characteristics, so we use a weighted moving average of order n to smooth the initial read rate 
sequences. The process of replacing the read rate sequences by its moving average eliminates 
unwanted fluctuations. Furthermore, the influence of extreme values can be reduced by employing a 
weighted moving average with appropriate weights to get more realistic movement features of items to 
be monitored. The calculation is as follows: 

Pt= ~ " 

(4) 



where w\ and wo are the weights of read rate of current epoch and other epochs respectively. 

In the above treatment, we only discuss such epoch whose read rate p[ =£ 0. When the read rate of 
the epoch is p[ = 0, there are two possibilities: the tag is indeed outside the detection range or miss 
reading occurs to the tag, i.e., the tag is in the detection range but not captured for interference factors. 
It is necessary for accurate data cleaning to distinguish these two cases clearly. We should analyze its 
movement feature in the adjacent time. The movement of tags is approximately uniform linear motion 
and satisfies Equation (3), so we can calculate the read rate p t of the tag by the value of K and the read 
rate p ia in the adjacent time, to further determine it is a true value or a missed reading. In order to solve 
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the coefficient K 9 we denote epochj = {t/ 9 pj} 9 where tj and pj are the time and read rate of epochj 
respectively, and a training set TS = {epochal \ pu * 0, -m ^ I ^ m} 9 where the upper limit of \TS\ 
is 2m + 1 . So the coefficient K can be solved by the method of least squares on the trainings set of TS 9 
which estimates the best- fitting straight line as the one that minimizes the error between the actual data 
and the estimate of the line: 

I (Pj-mj-t) 

epoch jeTS 

Kl= E ( P ,-Pf (5> 

epoch jeTS 

B^T-K.p (6) 

i epoch jeTS — epoch jeTS 

wnere, p — — . — . — , t — — ; — ; — . 

\TS\ \TS\ 

3.3. Reverse Order Filling Mechanism (ROFM) 

In the data stream processing, data are normally processed in order. However, if the RFID data 
stream corresponding to a tag is filled in chronological order by the above-mentioned method, it is 
easy to bring the problem of miss filling, as shown in Figure 3(a). We analyze the read rate of a tag in 
one time period in detail in Figure 3. Figure 3(c) indicates the read rate of the tag without miss readings 
and Figure 3(b) shows the raw read rate that the reader actually read. For an epoch p in Figure 3(a), if the 
corresponding coefficient K p > 0 and the data before the time t p has been miss read for a long period of 
time, the data before a period of t p will not be filled because the RFID data stream are processed in 
order. A simple solution is to process the RFID data stream twice, forward and backward. However, 
this will add a lot of computational overhead. To solve this problem, we introduce a Reverse Order 
Filling Mechanism. As soon as we detect the situation mentioned above occurs, the read rate of the 
corresponding data stream is to be refilled in the reverse direction from epoch p +r. Until the original 
read rate p t ^ 0 or the filling value of read rate pf = 0 the reverse filling mechanism will not be 
terminated. And the rest of data will be processed after that. So we only need a twice process to the 
corresponding data rather than all data, which ensures the completeness of RFID data cleaning, but 
also does not add too much computational overhead. Algorithm 1 shows a pseudo-code description of 
BBS cleaning algorithm. 

Figure 3. The cleaning result of one-way BBS. (a) one-way BBS; (b) raw; (c) reality. 
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Figure 3. Cont. 
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Algorithm 1. A pseudo-code description of BBS cleaning algorithm. 

Algorithm BBS 

Require: Tags = set of all observed tag Ids 

TS = the trainings set, the upper limit of\TS\ is 2m+l 
for (tag in Tags) do 

while (GetNextEpochQ) do 
Pi=0 

ifO'NO) then 

Pi^Smoothing(wo, w\, n) II Equation 4 

K^CurrentSlope (TS) II Equation 5 

if (pi>p c && K>0 && pi.i==0) then 
// check whether it is necessary to switch on ROFM or not 

k=U 

GetEpoch(ti+T) 

while (ti>t 0 || (ti<t 0 && Pi==0 && Readrate(K, B, p ia , t ia , t t )>0)) do 
if (pi==0) then 

Pi^Readrate(K, B,p ia , t ia , t t ) II Equation 3 
end if 
end while 
GetEpoch(to+T) 
end if 
else if !=0) then 
K^CurrentSlope (TS) 
Pi^Readrate(K, B,p ia , t ia9 t t ) 
else if (pi-i=0) then 

Pi^Readrate(K, B,p ia , t ich t t ) 
end if 
end while 
end for 



// Equation 5 
// Equation 3 

// Equation 3 



Sensors 2012, 12 10203 



4. Experimental Evaluation 

In this section, we present an analysis of the performance of BBS on several data sets and compare 
its accuracy with other cleaning methods. All the experiments were conducted on an Intel (R) Core (TM) 2 
Duo CPU T9550 @ 2.66 GHz 2.67 GHz System with 2 GB of RAM. Our data include both the real 
collected data and simulation data. The laboratory equipments used for collecting data include Invengo 
XCRF-860 RFID UHF reader with 902-928 MHz frequency range, Invengo XCAF-12L antenna and 
XCTF-8101A tag. The simulation data for our experiments were generated by a synthetic RFID data 
generator that simulates the operation of RFID readers under a wide variety of conditions. We simulate 
various movements of tags with different missing rates. The missing rate means the probability that 
missed reading happens. 

4.1. Accuracy Comparison 

In the experiment, we compare the accuracy of data filled by BBS (with n = 3, n = 7 and n = 11, 
respectively), SMURF, and sliding-windows methods (with different window size: 5 epoch, 20 epoch 
and 35 epoch) under different missing rate (from 10% to 80%). The other experimental parameters of 
BBS are set as follows: m = 7, wo = 1 and w\ = 2. We clean the same raw data with different methods. 
Comparing the corresponding cleaning result with real data, we can get the error rate of each method. 
As shown in Figure 4, the error rate of BBS is lower than that of sliding windows methods in all cases. 
We found that the choice of the parameter n will have some impact on the experimental results when 
the missing rate is greater than 70%. Therefore, in practical applications, for optimal cleaning results 
we should set parameters n, m, wo and w\ with appropriate values in accordance with the actual needs. 
Usually, the more unstable the read rate sequence, the larger the value of n should be set; the higher the 
missing rate, the larger the value of m should be set. 



Figure 4. Accuracy comparison under different missing rates, (a) missing rate = 10%; 
(b) missing rate = 20%; (c) missing rate = 30%; (d) missing rate = 40%; (e) missing rate = 50%; 
(f) missing rate = 60%; (g) missing rate = 70%; (h) missing rate = 80%. 
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Figure 4. Cont. 
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We compare the accuracy of data filled by different methods under different tag speeds. The error 
rates obtained are used to compare the accuracy of methods where lower error rate means higher 
accuracy. As shown in Figure 5, the results of BBS are obviously superior to all other methods, 
especially when the speeds of tags are higher than 1.0 m/s. 

Figure 5. Accuracy comparison under different tag speeds. 



0.7 




0.2 0.5 1.0 2.0 4.0 

TagSpeeds(m/s) 



Sensors 2012, 12 



10205 



Furthermore, we analyze the case in one time period in detail (missing rate = 50%, and the length of 
time is 1,000 epochs). As shown in Figure 6, Reality refers to readings that would have been produced 
by a perfect reader without missreadings. Raw means the raw data that the reader actually read and 
while the others refer to the data filled with four kinds of data cleaning methods (BBS, SMURF, 
5 epoch sliding-window, and 35 epoch sliding- window). The bold horizontal lines indicate the tag is 
present/read, and vice versa. The line at the bottom of Figure 6 is the real data of read rate without 
miss reading, and another line above it is the estimate of read rate by BBS (n = 7, Wo = 1 and w\ = 2). 
Obviously, compared with sliding window methods, BBS greatly improved the accuracy of RFID data 
cleaning. In particular, our BBS method not only accurately draws whether the tag is in the read range 
of the reader, but also can give the read rate of each epoch. So BBS make it possible to get a more 
exact position of the tag. 

Figure 6. Analysis in detail. 




4.2. Performance Comparison 

To verify the validity and necessity of Reverse Order Filling Mechanism, we design the following 
experiment. We focus on readings produced from a single tag with different missing rates from 10% 
to 80% in 10,000 epochs. We process 8 sets of data by two different methods, the Reverse Order 
Filling Mechanism method and the Twice Scanning Method, and determine their performance by 
comparing the response times. As show in Figure 7, the Reverse Order Filling Mechanism only needs 
a twice cleaning process to the corresponding data while the Twice Scanning Method needs the 
process to all data. Therefore, the former is obviously superior to the latter in the efficiency of 
processing data. 



Figure 7. Performance comparison. 
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5. Conclusions 

Accurate data cleaning is an essential task for the successful deployment of RFID systems. In this 
paper, we have proposed a behavior-based unreliable RFID data smoothing system BBS, which can 
take advantage of kinematic characteristics of tags to assist in RFID data cleaning. A movement 
behavior detection model is proposed to establish the conversion relationship between RFID data and 
kinematic parameters of the tags. Then we reduce the influence of extreme values and other unwanted 
fluctuations by employing a weighted moving average of order n. Moreover, Reverse Order Filling 
Mechanism (ROFM) is proposed for BBS to ensure a more complete access to get the movement 
behavior characteristics of tag. Finally, we validate our solution with a common RFID application and 
demonstrate the advantages of our approach through extensive simulations. 
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